Predicting an Author's Demographics from Text using Topic Modeling Approach

نویسندگان

  • Hafiz Rizwan Iqbal
  • Muhammad Adnan Ashraf
  • Rao Muhammad Adeel Nawab
چکیده

The paper presents an approach to predict personality traits of a writer for the author profiling task of the PAN CLEF 2015. The task aimed at predicting authors’ demographics based on the written tweets of an author. These demographics included traditional authorship attributes of age, gender and various personality traits of an author. We applied topic modeling using LDA as baseline approach and used the generated topic to get hierarchical probabilities of the topics. J48 decision tree was used for training classification model. The trained models were then used to successfully predict the demographics of training and test datasets

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Predicting distribution of Eurasian Lynx (Lynx lynx) using an ensemble modeling approach: A Case Study: Saveh Zarandieh Kharaghan Area, Markazi Province

Adequate knowledge about suitable habitats for wildlife is essential to prevent habitat destruction and extinction of species and for their conservation and management. The Eurasian lynx is one of the mostly distributed cats in Asia. In this study, we applied an ensemble habitat suitability modeling approach, using ten predictor variables to model Eurasian Lynx’s habitat suitability in Saveh Za...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015